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Abstract 

We  establish  the  restricted  isometry  property  for  finite  dimensional  Gabor  systems,  that  is,  for 
families  of  time-frequency  shifts  of  a  randomly  chosen  window  function.  We  show  that  the  s-th 
order  restricted  isometry  constant  of  the  associated  nxn2  Gabor  synthesis  matrix  is  small  provided 
s  <  cn2^3/  log2  n.  This  improves  on  previous  estimates  that  exhibit  quadratic  scaling  of  n  in  s. 
Our  proof  develops  bounds  for  a  corresponding  chaos  process. 
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1  Introduction  and  statements  of  results 

Sparsity  has  become  a  key  concept  in  applied  mathematics  and  engineering.  This  is  largely  due  to 
the  empirical  observation  that  a  large  number  of  real-world  signals  can  be  represented  well  by  a  sparse 
expansion  in  an  appropriately  chosen  system  of  basic  signals.  Compressive  sensing  [9,  11,  13,  19,  21,  44] 
predicts  that  a  small  number  of  linear  samples  suffices  to  capture  all  the  information  in  a  sparse  vector 
and  that,  furthermore,  we  can  recover  the  sparse  vector  from  these  samples  using  efficient  algorithms. 
This  discovery  has  a  number  of  potential  applications  in  signal  processing,  as  well  as  other  areas  of 
science  and  technology. 

Linear  data  acquisition  is  described  by  a  measurement  matrix.  The  restricted  isometry  property 
(RIP)  [12,  13,  21,  44]  is  by-now  a  standard  tool  for  studying  how  efficiently  the  measurement  matrix 
captures  information  about  sparse  signals.  The  RIP  also  streamlines  the  analysis  of  signal  reconstruc¬ 
tion  algorithms,  including  £-\ -minization,  greedy  and  iterative  algorithms.  Up  to  date  there  are  no 
deterministic  constructions  of  measurement  matrices  available  that  satisfy  the  RIP  with  the  optimal 
scaling  behavior;  see,  for  example,  the  discussions  in  [44,  Sec.  2.5]  and  [21,  Sec.  5.1].  In  contrast,  a  vari¬ 
ety  of  random  measurement  matrices  exhibit  the  RIP  with  optimal  scaling,  including  Gaussian  matrices 
and  Rademacher  matrices  [3,  20,  47,  13]. 

Although  Gaussian  random  matrices  are  optimal  for  sparse  recovery  [19,  25],  they  have  limited  use  in 
practice  because  many  applications  impose  structure  on  the  matrix.  Furthermore,  recovery  algorithms 
are  significantly  more  efficient  when  the  matrix  admits  a  fast  matrix-vector  multiplication.  For  example, 
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random  sets  of  rows  from  a  discrete  Fourier  transform  matrix  model  the  measurement  process  in  MRI 
imaging  and  other  applications.  These  random  partial  Fourier  matrices  lead  to  fast  recovery  algorithms 
because  they  can  utilize  the  FFT.  It  is  known  that  a  random  partial  Fourier  matrix  satisfies  a  near- 
optimal  RIP  [13,  49,  42,  44]  with  high  probability;  see  also  [44,  48]  for  some  generalizations. 

This  paper  studies  another  type  of  structured  random  matrix  that  arises  from  time-frequency  analy¬ 
sis,  and  has  potential  applications  for  the  channel  identification  problem  [41]  in  wireless  communications 
and  sonar  [35,  50],  as  well  as  in  radar  [30].  The  columns  of  the  considered  n  x  n2  matrix  consist  of 
all  discrete  time-frequency  shifts  of  a  random  vector.  Previous  analysis  of  this  matrix  has  provided 
bounds  for  the  coherence  [41],  as  well  as  nonuniform  sparse  recovery  guarantees  using  t\  -minimization 
[45].  However,  the  so  far  best  available  bounds  on  the  restricted  isometry  constants  were  derived  from 
coherence  bounds  [41]  and,  therefore,  exhibit  highly  non-optimal  quadratic  scaling  of  n  in  the  sparsity 
s.  This  paper  dramatically  improves  on  these  bounds.  Such  an  improvement  is  important  because  the 
nonuniform  recovery  guarantees  in  [45]  apply  only  for  ^-minimization,  they  do  not  provide  stability  of 
reconstruction,  and  they  do  not  show  the  existence  of  a  single  time-frequency  structured  measurement 
matrix  that  is  able  to  recover  all  sufficiently  sparse  vectors.  Also  it  is  of  theoretical  interest  whether 
Gabor  systems,  that  is,  the  columns  of  our  measurement  matrix,  can  possess  the  restricted  isometry 
property.  Nevertheless,  our  results  still  fall  short  of  the  optimal  scaling  that  one  might  hope  for. 

Our  approach  is  similar  to  the  recent  restricted  isometry  analysis  for  partial  random  circulant  matri¬ 
ces  in  [46].  Indeed,  also  here  we  bound  a  chaos  process  of  order  2,  by  means  of  a  Dudley  type  inequality 
for  such  processes  due  to  Talagrand  [53].  This  requires  to  estimate  covering  numbers  of  the  set  of  unit 
norm  s-sparse  vectors  with  respect  to  two  different  metrics  induced  by  the  process.  In  contrast  to  [46], 
the  specific  structure  of  our  problem  does  not  allow  us  to  reduce  to  the  Fourier  case,  and  to  apply 
covering  number  estimates  shown  in  [49]. 

This  paper  is  organized  as  follows.  In  Section  1.1  we  recall  central  concepts  in  compressive  sensing. 
Section  1.2  introduces  the  time- frequency  structured  measurement  matrices  that  are  considered  in  this 
paper,  and  we  state  our  main  result,  Theorem  1.  Remarks  on  applications  in  wireless  communications 
and  radar,  as  well  as  the  relation  of  this  paper  to  previous  work  are  given  in  Sections  1.4  and  1.3, 
respectively.  Sections  2,  3  and  4  provide  the  proof  of  Theorem  1. 

1.1  Compressive  Sensing 

In  general,  reconstructing  x  =  (aq, . . . ,  Xn)t  £  CN  from 

y  =  Ax£  C",  (1) 

where  A  £  <CnxN  and  n  <C  N  (in  this  paper,  we  have  N  =  n2)  is  impossible  without  substantial  a-priori 
information  on  x.  In  compressive  sensing  the  assumption  that  x  is  s-sparse,  that  is,  ||a:||o  :=  #{£  ■ 
xi  7^  0}  <  s  for  some  s  <C  N  is  introduced  to  ensure  uniqueness  and  efficient  recoverability  of  x.  More 
generally,  under  the  assumption  that  x  is  well-approximated  by  a  sparse  vector,  the  question  is  posed 
whether  an  optimally  sparse  approximation  to  x  can  be  found  efficiently. 

Reconstruction  of  a  sparse  vector  x  by  means  of  the  ^-minimization  problem, 

min  ||2:||0  subject  to  y  =  Az, 

Z 

is  NP-lrard  [36]  and  therefore  not  tractable.  Consequently,  a  number  of  alternatives  to  fo-minimization, 
for  example,  greedy  algorithms  [5,  23,  37,  54,  55],  have  been  proposed  in  the  literature.  The  most 
popular  approach  utilizes  t\ -minimization  [11,  15,  19],  that  is,  the  convex  program 

min||z||i  subject  to  y  =  Az,  (2) 

Z 

is  solved,  where  ||z||i  =  |zi|  +  \z2\  +  . . .  +  |zjv|  denotes  the  usual  l\  vector  norm. 

To  guarantee  recoverability  of  the  sparse  vector  x  in  (1)  by  means  of  t\  -minimization  and  greedy 
algorithms,  it  suffices  to  establish  the  restricted  isometry  property  (RIP)  of  the  so-called  measurement 
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matrix  A:  define  the  restricted  isometry  constant  Ss  of  an  n  x  N  matrix  A  to  be  the  smallest  positive 
number  that  satisfies 

(1  —  (5S ) 1 1 cc 1 1 1  <  || ^4.a;|| |  <  (1  +  ) 1 1 a; 1 1 1  for  all  x  with  ||cc||0  <  s.  (3) 

In  words,  the  statement  (3)  requires  that  all  column  submatrices  of  A  with  at  most  s  columns  are 
well-conditioned.  Informally,  A  is  said  to  satisfy  the  RIP  with  order  s  when  Ss  is  “small” . 

Now,  if  the  matrix  A  obeys  (3)  with 

<  <5*  (4) 

for  suitable  constants  k  >  1  and  <5*  <  1 ,  then  many  algorithms  precisely  recover  any  s-sparse  vectors  x 
from  the  measurements  y  =  Ax.  Moreover,  if  x  can  be  well  approximated  by  an  s  sparse  vector,  then 
for  noisy  observations 

y  =  Ax  +  e  where  1 1 e || 2  <  r, 

these  algorithms  return  a  reconstruction  x  that  satisfies  an  error  bound  of  the  form 

||®-*||2  <  Ci <Js^1  +C2t ,  (5) 

Vs 

where  as(x)  1  =  inf || ^ ||0<s  II*  —  z\\i  denotes  the  error  of  best  s-term  approximation  in  l\  and  C 1,  C2  are 
positive  constants.  For  illustration,  we  include  Table  1  which  lists  available  values  for  the  constants  k 
and  <5*  in  (4)  that  guarantee  (5)  for  several  algorithms  along  with  respective  references. 


Algorithm 

K 

5* 

References 

£i-minimization  (2) 

2 

4W6  «  °'4652 

[8,  10,  12,  22] 

CoSaMP 

4 

/CA  “  X'3843 

[24,  54] 

Iterative  Hard  Thresholding 

3 

1/2 

[5,  22] 

Hard  Thresholding  Pursuit 

3 

l/-v/3  w  0.5774 

[23] 

Table  1:  Values  of  the  constants  k  and  S*  in  (4)  that  guarantee  success  for  various  recovery  algorithms. 


For  example,  Gaussian  random  matrices,  that  is,  matrices  that  have  independent,  normally  dis¬ 
tributed  entries  with  mean  zero  and  variance  one,  have  been  shown  [3,  13,  34]  to  have  restricted  isometry 
constants  of  satisfy  Ss  <  5  with  high  probability  provided  that 

n  >  C5~2s\og(N/s). 

That  is,  the  number  n  of  Gaussian  measurements  required  to  reconstruct  an  s-sparse  signal  of  length 
N  is  linear  in  the  sparsity  and  logarithmic  in  the  ambient  dimension.  See  [3,  13,  34,  21,  44]  for  precise 
statements  and  extensions  to  Bernoulli  and  subgaussian  matrices.  It  follows  from  lower  estimates  of 
Gelfand  widths  that  this  bound  on  the  required  samples  is  optimal  [17,  25,  26],  that  is,  the  log-factor 
must  be  present. 

As  discussed  above,  no  deterministic  construction  of  a  measurement  matrix  is  known  which  provides 
RIP  with  optimal  scaling  of  the  recoverable  sparsity  s  in  the  number  of  measurements  n.  In  fact,  all 
available  proofs  of  the  RIP  with  close  to  optimal  scaling  require  the  measurement  matrix  to  contain  some 
randomness.  In  Table  2  we  list  the  Shannon  entropy  (in  bits)  of  various  random  matrices  along  with  the 
available  RIP  estimates.  Compared  to  Gaussian  random  matrices,  the  Gabor  synthesis  measurement 
matrices  constructed  in  this  paper  introduces  only  a  small  amount  of  randomness,  that  is,  the  presented 
measurement  matrix  depends  only  on  the  so-called  Gabor  window,  a  random  vector  of  length  n,  which 
can  be  chosen  to  be  a  normalized  copy  of  a  Rademacher  vector.  Moreover,  the  random  Gabor  matrix 
provably  provides  scaling  of  s  roughly  in  n2/3,  which  significantly  improves  on  known  deterministic 
constructions.  Clearly,  such  scaling  falls  short  of  the  optimal  one,  but  we  expect  that  it  is  possible 
to  establish  linear  scaling  of  s  in  n  up  to  log-factors,  similar  to  Gaussian  matrices  or  partial  random 
Fourier  matrices.  However,  such  improvement  seems  to  require  more  powerful  methods  to  estimate 
chaos  processes  than  presently  available. 
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n  x  N  Measurement  matrix 

Shannon  entropy 

RIP  regime 

References 

Gaussian 

nN  ^  log(27re) 

s  <  Cn/  log  AT 

[3,  20,  49] 

Rademacher  entries 

nN 

s  <  Cn/ log  N 

[3] 

Partial  Fourier  matrix 

N  log2  N—n  log2  n 
—  ( N—n )  log 2{N—n) 

s  <  Cn/  log4  N 

[46,  49] 

Partial  circulant  Rademacher 

N 

s  <  Cn2/3  /  log2/3  N 

[46] 

Gabor,  Rademacher  window 

n 

s  <  Cnf!3/ log2  n 

this  paper 

Gabor,  Alltop  window 

0 

s  <  Cy/n 

[41] 

Table  2:  List  of  measurement  matrices  that  have  been  proven  to  be  RIP,  scaling  of  sparsity  s  in  the 
number  of  measurements  n,  and  the  respective  Shannon  entropy  of  the  (random)  matrix. 


1.2  Time-frequency  structured  measurement  matrices 

In  this  paper,  we  provide  probabilistic  estimates  of  the  restricted  isometry  constants  for  matrices  whose 
columns  are  time-frequency  shifts  of  a  randomly  chosen  vector.  To  define  these  matrices,  we  let  T 
denote  the  cyclic  shift,  also  called  translation  operator,  and  M  the  modulation  operator,  or  frequency 
shift  operator,  on  C".  They  are  defined  by 


(Th)q  —  hqQi 

and 

[Mh)q  =  e2niq/nhv  =  uqhq, 

(6) 

where  0  is  subtraction  modulo  n  and  ui 

=  e2™' 

n.  Note  that 

(Tkh)q  =  hqek 

and 

(. Mlh)q  =  e2*Uq/nhq  =  Jqhq. 

(7) 

The  operators  7t(A)  =  MfTk,  A  =  ( k,£ ),  are  called  time-frequency  shifts  and  the  system  {7t(A)  :  A  € 
ZnxZn},  Zn  =  {0, 1 ,n  —  1},  of  all  time-frequency  shifts  forms  a  basis  of  the  matrix  space  Craxn 
[32,  31] . 

We  choose  e  £  Cn  to  be  a  Rademacher  or  Steinhaus  sequence,  that  is,  a  vector  of  independent  random 
variables  taking  the  values  +1  and  —  1  with  equal  probability,  respectively  taking  values  uniformly 
distributed  on  the  complex  torus  S1  =  {z  £  C,  \  z\  =  1}.  The  normalized  window  is 


and  the  set 

{7r(A)fif  :  A  e  ZnxZn}  (8) 

is  called  a  full  Gabor  system  with  window  g  [28].  The  matrix  H/g  £  C"xn  whose  columns  list  the 
members  7t(A )g,  A  £  ZnxZn,  of  the  Gabor  system  is  referred  to  as  Gabor  synthesis  matrix  [16,  32,  40]. 
Note  that  H/g  allows  for  fast  matrix  vector  multiplication  algorithms  based  on  the  FFT.  The  main  result 
of  this  paper  addresses  the  restricted  isometry  constants  of  H/g.  Below  E  denotes  expectation  and  P 
the  probability  of  an  event. 

theorem  1  Let  H/g  €  Cnxn  be  a  draw  of  the  random  Gabor  synthesis  matrix  with  normalized  Steinhaus 
or  Rademacher  generating  vector. 

(a)  The  expectation  of  the  restricted  isometry  constant  Ss  o/H/g  ,  s  <  n,  satisfies 

(  I S3^2  -  O' joa-3/2  . 

E Ss  <  max  j C\ y  — —  log  sy/\ogn,  C2 - — - j,  (9) 

where  C\ ,  C2  >  0  are  universal  constants. 
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(b)  For  0  <  A  <  1,  we  have 

F(6,  >  E[(JS]  +  A)  <  e~x2/a2 ,  where  a2  =  ^  l0g”  ^  5  (10) 

n 

with  C'3  >  0  being  a  universal  constant. 

With  slight  variations  of  the  proof  one  can  show  similar  statements  for  normalized  Gaussian  or 
subgaussian  random  windows  g. 

Roughly  speaking  \I>g  satisfies  the  RIP  of  order  s  with  high  probability  if  n  >  Cs3/2  log3  (n),  or 
equivalently  if, 

s  <  cn 2/3/ log2  n. 

We  expect  that  this  is  not  the  optimal  estimate,  but  improving  on  this  seems  to  require  more  sophisti¬ 
cated  techniques  than  pursued  in  this  paper.  There  are  known  examples  [33,  53]  for  which  the  central 
tool  in  this  paper,  the  Dudley  type  inequality  for  chaos  processes  stated  in  Theorem  3,  is  not  sharp. 
We  may  well  be  facing  one  of  these  cases  here. 

Numerical  tests  illustrating  the  use  of  \Fg  for  compressive  sensing  are  presented  in  [41].  They 
illustrate  that  empirically  performs  very  similarly  to  a  Gaussian  matrix. 

1.3  Application  in  wireless  communications  and  radar 

An  important  task  in  wireless  communications  is  to  identify  the  communication  channel  at  hand,  that 
is,  the  channel  opperator,  by  probing  it  with  a  small  number  of  known  transmit  signals;  ideally  a  single 
probing  signal.  A  common  finite-dimensional  model  for  the  channel  operator,  that  combines  digital 
(discrete)  to  analog  conversion,  the  analog  channel,  and  analog  to  digital  conversion.  It  is  given  by 
[4,  18,  27,  38] 

r=  X^(A)- 

AeZ„xZ„ 

Time-shifts  model  delay  due  to  multipath-propagation,  while  frequency-shifts  model  the  Doppler  effect 
due  to  moving  transmitter,  receiver,  and/or  scatterers.  Physical  considerations  often  suggest  that  x  is 
rather  sparse  as,  indeed,  the  number  of  present  scatterers  can  be  assumed  to  be  small  in  most  cases. 
The  same  model  is  used  as  well  in  sonar  [35,  50]  and  radar  [30]. 

Our  task  is  to  identify  from  a  single  input  output  pair  ( g,Tg )  the  coefficient  vector  x.  In  other 
words,  we  need  to  reconstruct  T  £  Cnxn,  or  equivalently  x ,  from  its  action  y  =  Tg  on  a  single  vector 
g.  Writing 

y  =  Tg=  ^2  xxn(\)g  =  'S>gx  (11) 

AeZ„  xZ„ 

with  unknown  but  sparse  x ,  we  arrive  at  a  compressive  sensing  problem.  In  this  setup,  we  clearly  have 
the  freedom  to  choose  g ,  and  we  may  choose  it  as  a  random  Rademacher  or  Steinhaus  sequence.  Then 
the  restricted  isometry  property  of  4/g,  as  shown  in  Theorem  1,  ensures  recovery  of  sufficiently  sparse 
x ,  and  hence,  of  the  associated  operator  T. 

Recovery  of  the  sparse  x  in  (11)  can  also  be  interpreted  as  finding  a  sparse  time-frequency  represen¬ 
tation  of  a  given  y  with  respect  to  the  window  g.  From  an  application  point  of  view  though,  the  vectors 
considered  here  are  not  well  suited  to  describe  meaningful  sparse  time-frequency  representations  of  x 
as  all  g  that  are  known  to  guarantee  RIP  of  \t' g  are  very  poorly  localized  both  in  time  and  in  frequency. 

1.4  Relation  with  previous  work 

Time-frequency  structured  matrices  4/g  appeared  in  the  study  of  frames  with  (near-)optimal  coherence. 
Recall  that  the  coherence  of  a  matrix  A  =  (ai|  . . .  |ajv)  with  normalized  columns  || || 2  =  1  is  defined 
as 

g  :=  max  | (a^,afe)  | . 


5 


Choosing  the  Alltop  window  [1,  51]  g  £  C™  with  entries  gi 
with  coherence 

1 


A*  = 


\Jn 


=  n  1/2e2'7r»C/11  for  n  >  5  prime  yields  \I/g 


Due  to  the  general  lower  bound  g  >  y for  annxA  matrix  [51],  this  coherence  is  almost  optimal. 
Together  with  the  bound  <  (s  —  l)g  we  obtain 


6S  < 


s  —  1 


This  requires  a  scaling  s  <  C\pn.  to  achieve  sufficiently  small  RIP  and  sparse  recovery,  which  clearly  is 
worse  than  the  main  result  of  this  paper. 

The  coherence  of  M/g  with  Steinhaus  sequence  g  is  estimated  in  [41]  by 


g  <  c 


log  {n/e) 

n 


holding  with  probability  at  least  1  —  e.  As  before,  this  does  not  give  better  than  quadratic  scaling  of  n 
in  s  in  order  to  have  small  RIP  constants  6S. 

The  following  nonuniform  recovery  results  for  -minimization  with  and  Steinhaus  sequence  g 
was  derived  in  [45]. 


theorem  2  Let  x  £  C"  be  s-sparse.  Choose  a  Steinhaus  sequence  g  at  random.  Then  with  probability 
at  least  1  —  e,  the  vector  x  can  be  recovered  from  y  =  ^fgx  via  £i~minimization  provided 

n 

~  C\og(n/e) ' 

Clearly,  the  (optimal)  almost  linear  scaling  of  n  in  s  of  this  estimate  is  better  than  the  RIP  estimate  of 
the  main  Theorem  1.  However,  the  conclusion  is  weaker  than  what  can  be  derived  using  the  restricted 
isometry  property:  recovery  in  Theorem  2  is  nonuniform  in  the  sense  that  a  given  s-sparse  vector  can 
be  recovered  with  high  probability  from  a  random  draw  of  the  matrix  It  is  not  stated  that  a  single 
matrix  can  recover  all  s-sparse  vectors  simultaneously.  Moreover,  nothing  is  said  about  the  stability 
of  recovery,  while  in  contrast,  small  RIP  constants  imply  (5).  Therefore,  our  main  Theorem  1  is  of 
high  interest  and  importance,  despite  the  better  scaling  in  Theorem  2.  Moreover,  we  expect  that  an 
improvement  of  the  RIP  estimate  is  possible,  although  it  is  presently  not  clear  how  this  can  be  achieved. 

Partial  random  circulant  matrices  are  a  different,  but  closely  related  measurement  matrix,  studied  in 
[29,  43,  44,  46].  They  model  convolution  with  a  random  vector  followed  by  subsampling  on  an  arbitrary 
(deterministic)  set.  The  so  far  best  estimate  of  the  restricted  isometry  constants  Ss  of  such  an  n  x  N 
matrix  in  [46]  requires  n  >  c(s  log  iV)3/2,  similarly  to  the  main  result  of  this  paper.  The  corresponding 
analysis  requires  to  bound  as  well  a  chaos  process,  which  is  also  achieved  by  the  Dudley  type  bound 
of  Theorem  3  below.  Nonuniform  recovery  guarantees  for  partial  random  circulant  matrices  similarly 
to  Theorem  2  are  contained  in  [43,  44].  The  analysis  of  circulant  matrices  benefits  from  a  simplified 
arithmetic  in  the  Fourier  domain,  a  tool  not  available  to  us  in  the  case  of  Gabor  synthesis  matrices. 
Hence,  the  analysis  presented  here  is  more  involved. 


2  Expectation  of  the  restricted  isometry  constants 

We  first  estimate  the  expectation  of  the  restricted  isometry  constants  of  the  random  Gabor  synthesis 

matrix,  that  is,  we  shall  prove  Theorem  1(a).  To  this  end,  we  first  rewrite  the  restricted  isometry 

2 

constants  6S.  Let  T  =  Ts  =  {x  £  C"  ,  || aj|| 2  =  1,  ||£c||o  <  s}.  Introduce  the  following  semi-norm  on 
Hermitian  matrices  A, 

I A I  s  =  sup  \x*Ax\. 

xGTs 
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Then  the  restricted  isometry  constants  of  SI/  =  VH g  can  be  written  as 


5,  =  ||¥*¥-J|| 


where  I  denotes  the  identity  matrix.  Observe  that  the  Gabor  synthesis  matrix  takes  the  form 


90 

9 1 

9n  —  l 

90 

■  91 

■  92 

90 

ugi 

91 

ujg2 

91 

UJn  1^2 

\ 

92 

91 

■  93 

ui2g2 

u293 

o;2(ri_1)g-3 

93 

92 

■  94 

u3g3 

u)3g4 

•  •• 

9n  —  l 

9n- 2  ■  ■ 

■  90 

U  9n- 1 

■■  00^90 

...  0 ;O-D290 

/ 

Our  analysis  in  this  section  employs  the  representation 

n—  1 

g=0 

with 


/  l 

0 

o  ■  ■ 

■  0 

i 

0 

o  ■  ■ 

0 

0  \ 

(  0 

1 

o  ■  ■ 

■  0 

0 

U) 

o  ■  • 

0 

0  \ 

0 

0 

i  •  • 

■  0 

0 

0 

U]2  ■■ 

0 

0 

II 

O 

\  0 

0 

o  ■  • 

•  i 

0 

0 

0  •  • 

•  I)""1 

J"”1)2  / 

= 

(I  M  M2  ■ 

. .  Mn- 

-1). 

/  0 

0 

o  •  ■ 

■  l 

0 

0 

0  ■  ■ 

1 

••  i  \ 

1 

0 

o  •  ■ 

■  0 

U) 

0 

0  •  • 

•  0 

••  0 

II 

0 

1 

0  ■  ■ 

■  0 

0 

w2 

0  •  ■ 

•  0 

••  0 

V  o 

6 

6  •  • 

■  6 

6 

6 

6  ■  ■ 

•  6 

..  oj 

=  (T\MT\M2T\  ■  ■  ■  \Mn~1T), 


and  so  on.  In  short,  for  q  G  Z„, 


Aq  =  ( Tq\MTq\M2Tq\  ■  ■  ■  \Mn~1Tq). 


Observe  that 


n—  1 


H  :=  -I=-I+-  ^  W^eq  A*q,Aq  . 


<7,<7'=0 


Using  (29)  below,  it  follows  that 


H  ~  n  J2  ei'  e«  Al'A<i  -  n  e«'  e«  W9'«’ 

q'^q  q',q 


(12) 


(13) 


where,  for  notational  simplicity,  we  use  here  and  in  the  following  Wq\q  =  A*,Aq  for  q  ^  q'  and 
Wq'^q  =  0  for  q  =  q'.  We  employ  the  matrix  B(x)  G  Cnxn,  x  G  Ts,  given  by  matrix  entries 

B(x)q,,q  =  x*Wql>qx.  (14) 

Then  we  have 

nE6s  =  E  sup  \YX\  =  E  sup  \YX  -  F0| ,  (15) 

x£Ts  xGTs 

where 

Yx  =  e*B(x)e=  ^2  eq  x*  A*,  Aqx  (16) 

9't O 
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and  x  G  Ts  =  {x  €  Cnxn,  || a;|| 2  <  1,  ||®||o  <  s}.  A  process  of  the  type  (16)  is  called  Rademacher  or 
Steinhaus  chaos  process  of  order  2.  In  order  to  bound  such  a  process,  we  use  the  following  Theorem,  see 
for  example,  [33,  Theorem  11.22]  or  [53,  Theorem  2.5.2],  where  it  is  stated  for  Gaussian  processes  and  in 
terms  of  majorizing  measure  (generic  chaining)  conditions.  The  formulation  below  requires  the  operator 
norm  ||A||2->.2  =  max||x||2=i  ||Ae||2  and  the  Frobenius  norm  ||Aj|F  =  Tr(A*A)1/2  =  (J2j,k  I 1 2 ) 1/2 , 
where  Tr(A)  denotes  the  trace  of  a  matrix  A. 


theorem  3  Let  e  =  (ei, . . . ,  en)T  be  a  Rademacher  or  Steinhaus  sequence,  and  let 

n 

1  x  £  B(x^€.  -  ^  [  Cq'  CqB^X^q!  q 

9', 9=1 

be  an  associated  chaos  process  of  order  2,  indexed  by  x  €  T,  where  we  additionally  assume  B(x) 
hermitian  with  zero  diagonal,  that  is,  B(x)q  q  =  0  and  B(x)qitq  =  B(x)q^q>.  We  define  two  (pseudo- 
fmetrics  on  T , 

di(x,  y)  =  |j B(x)  -  B(y) ||2_>2, 
di{x,y)  =  || B(x)  -  B(y)\\F. 


Let  N(T,di,u)  be  the  minimum  number  of  balls  of  radius  u  in  the  metric  di  needed  to  cover  T.  Then 
there  exists  a  universal  constant  K  >  0  such  that,  for  an  arbitrary  x0  £  T, 


logN(T,di,u)  du 

Jo 

Proof:  For  a  Rademacher  sequence,  the  theorem  is  stated  in  [46,  Proposition  2.2].  If  e  is  a  Steinhaus 
sequence  and  B  a  Hermitian  matrix  then 


v/log N(T,  d2,u)  du,  j.  (17) 


E  sup  \YX  -  YXo\  <  K  max  (  [ 
xeT  *-  Jo 


e* Be  =  Re(e*Be)  =  Re(e)*  Re(B)  Re(e)  —  Re(e)*  Im(B)  Im(e) 

+  Im(e)*  Im(.B)  Re(e)  +  Im(e)*  Re(B)  Im(e). 

By  decoupling,  see,  for  example,  [39,  Theorem  3.1.1],  we  have  with  e!  denoting  an  independent  copy  of 

e, 


E  sup  |  Re(e)*  Im(JB(a;))  Im(e)|  <  8E  sup  |  Re(e)*  Im(£?(a;))  Im(e,)| 

xeT  xeT 

<  8E  sup  |£*  Im(B(a;))  Im(e')|  <  8E  sup  |£*  Im(£}(a:))£,|, 

xe t  xeT 

where  denote  independent  Rademacher  sequences.  The  second  and  third  inequalities  follow  from 
the  contraction  principle  [33,  Theorem  4.4]  (and  symmetry  of  Re(e^),  Irn(Q)  )  first  applied  conditionally 
on  e'  and  then  conditionally  on  £  (note  that  |  Re(e^)|  <  1,  |  Im(e^)|  <  1  for  all  realizations  of  ef).  Using 
the  triangle  inequality  we  get 

Esup  | Yx  -YXo  |  <  16 E  sup  |£*(R e(B(x))  -  Re(B(x0))^'| 

xeT  xeT 

+  16Esup  |^*(Im(B(®))  -  Im(B(xo)))^,|.  (18) 

xeT 

Further  note  that  ||  Im(J3(a:))  —  Im(B(y))||,F,  ||Re(B(a:))  —  Re(B(y))||_F  <  ||B(a;)  —  B{y)\\p  and 
similarly,  writing  B(x)  —  B(y)  as  a  2nx2n  real  block  matrix  acting  on  M2"  we  see  that  also  ||  Im(J3(a;))  — 
Im(S(y))||2_>2,  ||  Re(JB(a;))  -  Re(S(y))||2_>,2  <  \\B(x)  -  B(y) ||2_>2.  Furthermore,  the  statement  for 
Rademacher  chaos  processes  holds  as  well  for  decoupled  chaos  processes  of  the  form  above.  (Indeed,  its 
proof  uses  decoupling  in  a  crucial  way.)  Therefore,  the  claim  for  Steinhaus  sequences  follows.  I 


Note  that  B(x)  defined  in  (14)  satisfies  the  hypotheses  of  Theorem  3  by  definition.  The  pseudo¬ 
metrics  are  given  by 

d2(x,y)  =  \\B(x)~  B(y)\\F  =  (j//\x*A*q,Aqx-y*A*q,Aqy\2^j  '  ,  (19) 

q'^q 

and 

di(x,y)  =  \\B(x)  ~  B{y)\\2^2- 

The  bound  on  the  expected  restricted  isometry  constant  follows  then  from  the  following  estimates  on 
the  covering  numbers  of  Ts  with  respect  to  d\  and  c?2  -  Corresponding  proofs  will  be  detailed  in  Section 
3.  We  start  with  N(Ts,d2,u). 

Lemma  4  For  u  >  0,  it  holds 

log (N(Ts,d2,u))  <  slog(en2/s)  +  slog(l  +4 yfsnu-1). 

The  above  estimate  is  useful  only  for  small  u  >  0.  For  large  u  we  require  the  following  alternative 
bound. 


Lemma  5  The  diameter  of  Ts  with  respect  to  d2  is  bounded  by  4y/sn,  and  for  yfn  <  u  <  4 yfsn,  it  holds 

log (N(Ts,d2,u))  <  cu~2ns3^2  log(ns5/2u-1), 
where  c  >  0  is  universal  constant. 

Covering  number  estimates  with  respect  to  di  are  provided  in  the  following  lemma. 


Lemma  6  The  diameter  of  Ts  with  respect  to  di  is  bounded  by  4s,  and  for  u  >  0 

\og{N{Ts,d\,u))  <  min  |slog(en2/s)  +  slog(l  +4sm-1), 

cu~2s2  log(2n)  log(n2/u)}  ,  (20) 

where  c>  0  is  a  universal  constant. 

Based  on  these  estimates  and  Theorem  3  we  complete  the  proof  of  Theorem  1(a).  By  Lemmas  4 
and  5,  the  subgaussian  integral  in  (17)  can  be  estimated  as 


poo  _  pA  y/sn  _ 

/  \/\og{N(Ts,  d2,u))du  =  /  y/log(N  (Ts,d2,u))du 

Jo  Jo 

py/n  _  pyfsn 

=  /  \A>g (N(TS,  d2,u))du  +  /  y/log {N{Ts,d2,u))du 

Jo  j  y/n 

py/n  pyfr  j - 

<  /  yj s  log (en2/s)du  +  /  ys  log(l  +  Ayfsnu-^du 

Jo  Jo  v 

/Ay/  sn  - - 

u  1  J\og{nsb/2u  x)du 

fn  V 

- 1/2 

<  y/sn  log (en2/s)  +  4 Sy/n  /  \/log(l  +  u~1)du 

Jo 

+  c\J  s3/2n\J\og(n1/2  s5/2)  log(\/i) 

<  y/s?ilog(e?r2/s)  +  4yfsnJ \og{e(\  +  \fs))  +  d y//s3/2nlog(n)k)g2(s) 
<  C\  J s3/2n  log(n)  log2  (s) . 


(21) 
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Hereby,  we  have  used  [44,  Lemma  10.3],  and  that  s  <  n.  Due  to  Lemma  6  the  subexponential  integral 
obeys  the  estimate,  for  some  n  >  0  to  be  chosen  below, 


/>oo  pAS 

/  log(N(Ts,d1,u))du=  /  log(N(Ts,d1,u))du 
Jo  Jo 

PK,  /»4  s 

=  /  log(N(Ta,di,u))du+  /  log(N(Ts,di,u))du 
JO  J  K 

pK, 

<  nslog(en2/s)  +  s  /  log(l  +  Asu~1)du  +  cs2  log(2n)  /  u~2  \og(n2 /u)du 

JO  J  K 

<  Ks\og{en2 / s)  +  4«;slog(e(l  +  k(4s)-1))  +  cs2n~l  log (2 n)  log (n2 /k). 


Choose  k  =  \J s  log(n)  to  reach 


\og(N(Ts,d1,u))du  <  C2s3/ 2  log 3/2(n). 


Combining  the  above  integral  estimates  with  (15)  and  Theorem  3  yields 


(22) 


E<5S  =  — E  sup  |  Yx  —  To  I  <  —  max  <  G\\f s3/2?rlog(n)  log2(s),  C2s3^2  log3^2(n)l . 

n  —  n  I  V 


n  xe  ts 

This  is  the  statement  of  Theorem  1(a). 


(23) 


Remark  7  In  analogy  to  the  estimate  of  a  subgaussian  entropy  integral  arising  in  the  analysis  of 
partial  random  circulant  matrices  in  [46],  we  expect  that  the  exponent  3/2  in  (21)  can  be  improved 
to  1.  However,  we  doubt  that  for  the  subexponential  integral  (22)  such  improvement  will  be  possible 
(indeed,  the  estimate  of  the  subexponential  integral  in  [46]  also  exhibits  an  exponent  of  3/2  at  the 
s-terrn),  so  that  we  did  not  pursue  an  improvement  of  (21)  here  as  this  would  not  provide  a  significant 
overall  improvement  of  (23).  We  expect  that  an  improvement  of  (23)  would  require  more  sophisticated 
tools  than  the  Dudley  type  estimate  for  chaos  processes  of  Theorem  3. 


3  Proof  of  covering  number  estimates 


In  this  section  we  provide  the  covering  number  estimates  of  Lemma  4,  5  and  6,  which  are  crucial  to  the 
proof  of  our  main  result.  We  first  introduce  additional  notation.  Let  d(m,  k)  =  So,m-k  and  5(m)  =  S o,m 
be  the  Kronecker  symbol  as  usual.  We  denote  by  suppcc  =  {£,Xf  ^  0}  the  support  of  a  vector  x.  Let 
A  be  a  matrix  with  vector  of  singular  values  a  (A).  For  0  <  q  <  oo,  the  Scliatten  S/-norm  is  defined  by 

II A||s,  :=  MA)||„  (24) 

where  ||  •  ||g  is  the  usual  vector  £q  norm.  For  an  integer  p,  the  S2p  norm  can  be  expressed  as 

l|A||s2p  =  (Tr((A*A)p))1/(2p>.  (25) 

The  Soo-norm  coincides  with  the  operator  norm,  ||  •  ||sTO  =  ||  •  ||2->2-  By  the  corresponding  properties  of 
fq-norms  we  have  the  inequalities 

||^||2— >2  <  ||-4||s,  <  rank(A)1/<?||A||2-).2-  (26) 

Moreover,  we  will  require  an  extension  of  the  quadratic  form  B(x)  in  (14)  to  a  bilinear  form, 


{B{x,z))q.,q  =  | 


x*A*,Aqz 

0 


if  q'  ±  q, 
if  q'  =  q. 


(27) 


Then  B(x)  =  B{x,x). 
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3.1  Time— frequency  analysis  on  Cn 

Before  passing  to  the  actual  covering  number  estimates  we  provide  some  facts  and  estimates  related  to 
time-frequency  analysis  on  Cn.  Observe  that  the  matrices  Aq  introduced  in  (12)  satisfy 


and,  hence, 


Clearly, 


(  (Tqr  \ 

(  T~q  \ 

(MTqy 

T~qM~1 

* 

II 

(M2Tq)* 

= 

T-qM~2 

y  (Mn_1T9)*  ) 

\  T~qMx  / 

(A*qy)(k/)=yk+q  co~^k+ql 


(Aqz,  y)  =  {z,  A*y)  =  z{k/)yk+qu}^k+^  =  z{k_q^ykuek 

=  Ek(Ee  Hk-q,^k)yk 


and,  hence, 

( AqZ'jk  =  ^  '  ^(k—q,l)^  ■ 

i 

In  the  following,  J1 :  C”  K >  C"  denotes  the  normalized  Fourier  transform,  that  is, 

n— 1 

{Fv)e  =  n~1/2  E  u~qtvq. 

<2=0 

For  v  £  Craxn,  T 2V  denotes  the  Fourier  transform  in  the  second  variable  of  v. 

Let  {eA}Ag^  x i  a,nd  {eq} qe%  denoting  the  Euclidean  basis  of  Cnx"  respectively  C",  and,  let  P\ 
denote  the  orthogonal  projection  onto  the  one  dimensional  space  span{eA}.  The  following  bounds  will 
be  crucial  for  the  covering  number  estimates  below. 


Lemma  8  Let  Aq  be  as  given  in  (12).  Then,  for  A  £  ZnxZ„,  q  £  Z„, 


Aqex  =  n(X)eq  , 

(28) 

n— 1 

Y/A*qAq=nI, 

(29) 

<2=0 

n — 1 

E  =  i , 

(30) 

<2=0 

n— 1 n— 1 

E  E  1  x*A*q,Aqy\2  <  n  |M|0  ||a;|||  \\y\\%. 

(31) 

<2=0  q'=  0 

Proof:  For  (28),  observe  that 

(Aqe(k0,e0))k  =  E^(fc~  q-  k0,i-to)utk  =  S(q-  (k  -  fc0))u/°fc 

i 

=  (7r(fc0,  lo)eq)k . 
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To  see  (29),  choose  z  £  Cnx"  and  compute 

(A‘,.A,z)vn  = 

t 

E„  ,  Xi-i')(k'+q') 

z(k'+q'—q,t)M 

l 

Hence, 

E  w>  =  E  E  =  E  Hv.<>  E 

q  q  t  (.  q 

=  ^2z(k\e)nS(£-  £')  =  nz (feV,) . 
i 

Finally,  observe  that  all  but  one  column  of  AqPuf(:  koy\  are  0,  the  nonzero  column  being  column  (£0,  ho), 
and  only  its  (ho  +  </)th  entry  is  nonzero,  namely,  it  is  u/°(feo+<d.  have 

AqpWoM)}A*q  =  Aqp{(e0,ko)}p{(io,ko)}A*q  =  Aqp{{ioM)}(Aqp{(ioM)}Y  ’ 

and  hence,  AqP{(toM)}A*q  =  P{ko+q}  and  J2q  AqP{(e0,k0)}A*q  =  I 
Let  x  £  Cnxn  and  A  =  suppa;,  then 

m2\x*K'A<iy\2 =mi\  n  x(k',e')(A*q'Aqy)k,e,\2 

q  q'  q  q'  (k',t')e  A 

<11*1111^  H  \(A*q'AQV)k',t>  I" 

q  q'  (k'4’)eA 

=  wil  EE  E 

q  q'  (k\e')eA  t 

=  wiiEE  E  IE‘y<l'+,V.-„-rt,,i|2 

q  q'  (k'/')e  A  i 

=  n  11*111  J2  (k'-(q-q'),k'+q')  I 

(k',t')e  A  q  q' 

=  n  11*112  11^2^11  2  =  n  |A|  11*112  \\y\\l  =  n  IMIoIHlilMll 

(. k',r)eA 

by  unitarity  of  JF2.  ■ 

3.2  Proof  of  Lemma  4 

For  x,  y  £  Cn  , 

d2(x,y)  <  \x*A*q,Aq{x-y)  J  +(^|  (x  -  y)*  A*q,  Aqy  J  . 

q'^q  q'^q 

Inequality  (31)  implies  that  for  x,y  £TS, 

(5Z|  X*A*q'Aq(X~y)  )  '  (X!  \(X~y)*Aq'Aqy  )  '  <  sfsn  ||x  -  y  ||2 

q'^q  q'^q 

and,  hence, 

d2{x,y)  <  2y/sn  \\x  —  y||2.  (32) 
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Using  the  volumetric  argument,  see,  for  example,  [44,  Proposition  10.1],  we  obtain 
N(TS,  ||  •  ||2,  u)  <  (”*)  (1  +  2/u)s  <  {en2/s)s{  1  +  2 /u)s. 

By  a  rescaling  argument 

N(TS,  d2,  u)  <  N(TS,  2v/sn||  •  ||2, «)  =  N(Ta,  [|  •  ||2,  u/(2 y/m)) 

<  (en2/s)s(l  +  4y/snu~1)s . 

Taking  the  logarithm  completes  the  proof. 

3.3  Proof  of  Lemma  5 

Now,  we  seek  a  suitable  estimate  of  the  covering  numbers  N(Ts,d\,u)  for  u  >  y/n.  Observe  that  by 
(32)  the  diameter  of  Ts  with  respect  to  d\  is  at  most  4 yfsn.  Hence,  it  suffices  to  consider  N{Ts,d\,u) 
for 

yfn  <  u  <  Ay/sn,  (33) 

as  stated  in  the  lemma.  We  use  the  empirical  method  [14],  similarly  as  in  [49].  We  define  the  norm 
||  •  ||*  on  Cnxn  by 

11*11*  =  E  I Re  **1 + 1 Im  xx\  •  (34) 

A 

For  x  £  Ts  we  define  a  random  vector  Z,  which  takes  ||as||*  sgn(RexA)eA  with  probability  - ,  and 
the  value  i||aj||*  sgn(ImxA)eA  with  probability 

Now,  let  Zu. . . ,  Z,m ,  Z[, ... ,  Z'm  be  independent  copies  of  Z.  We  set  y  =  E  li  Zj  and  y'  = 
—  Y^'jLi  Zj  and  attempt  to  approximate  B(x)  by 

m 

B  :=  B(y,  y')  =  £  B{Z^Z’y).  (35) 

jj'— 1 


First,  compute 


.  m 

E||S  —  B(x)\\2f  =  E^  \x*Wq’Ax  —  —  Y,  ZjW*«Z'A 


E( 

q,q' 


x*Wq*tqxr  ~  2  Re  ( x*Wn,.„x E 


Ca 


3J'-1 

1 


^  E 


j,j’= i 


-E 


m * 


EZ*W  >Z'; 

vvg,qlZjy 


j,j’= i 


=  E(- l**^*!2  +  ^4  E 


where  we  used  that  E[Z* Wgi9'Z',]  =  x*Wq^q>x,  j.  j'  =  1, . . .  m,  by  independence.  Moreover,  for  j  ^ 
and  j'  ^  j" ,  independence  implies 


E 


=  \X*Wq.q'X\2 . 


-J  "  •'q.q'^ 3" 

To  estimate  summands  with  j'  =  j",  note  that 

ZjWf'qZ'^Z'yYWqjZi."  =  \\x\\lZ*A*q,AqP{x}A*qAq,Zr„, 
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where  {A}  =  supp  Zy  is  random.  Hence,  in  this  case,  we  compute  using  (30)  in  Lemma  8 


E  E  [z-A^AgZ'^Z'^AlA^Zjr 

q’+q 


<  IMI*EE 


Z)A*q,AqP{X}A*qAq,Z^ 


=  II®||2E 


=  llscllfE 


Z;-£(A-.(Y,A,P[x]Al)A,)zt 

q'  q 

Z*  Y  (K'Aq')zr]  =  «ll*ll lnz*Zj 


n  x 


if  j=f‘ 


\  n\\x\\lE[Z*]E[Zj^}  =  n\\x\\l\\x\\l  <  n\\x\\i,  else. 

Symmetry  implies  an  identical  estimate  for  j  =  j'" ,  f  ^  j" .  As  x  £  Ts  is  s-sparse  we  have  ||a:||*  < 
\/2||a:||i  <  v/2s||®||2  <  V% s.  We  conclude 


E  E  E  {z]wq,q,z'y{z'y,rwiq,zy 

<  m2{m  —  l)2  ^  \x*  Wq,q'x\2  +  m2n4s2  +  2 m2(m  —  1  )n  •  2s. 
q',q 


For  m  >  11^2 2  and  u  <  4y/sn,  we  finally  obtain, 

E||B  -  B(x)\\2f  <  Y -\x*Wq',gx\- 
q',q 

m2n4s2  4ra2(ra  —  l)ns 


2(m2  —  1) 


mr 


Y\x*wq,q'x\ 


Q 


< 


+ 

rrv*  m1* 

4ns2  4ns  4ns2 

-  + -  < 


4?rs 


it2  < 


64ns 


44 


m  121n2s3  llns5  121ns  121-^s 


2  ^  2 
--U  <  u  . 


(36) 


Since  ||x||*  can  take  any  value  in  [1,  -\/2 s],  we  still  have  to  discretize  this  factor  in  the  definition  of 
the  random  variable  Z.  To  this  end,  set 

^  771 

Ba  :=^r  Y  B(asgn{xx  )exj,asgn{xX' :,)ey.  ). 

rry 1  ■"  ‘  ’  '  /  / 


i=ij'=i 


Next,  we  observe  that,  for  A  =  (k,£)  and  A'  =  ( k',£ 

B(eX',ex)q\q  =  ( Aq'ey)*Aqex  =  (7r(A)eq,  n(X')ey) 
=  r  ojV-t’Xk+q),  if  k’  +  q'  =  k  +  q ; 
l  0,  else, 


(37) 
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values  €  Js  :=  [1, 2s],  k  =  1, . . . ,  K,  such  that  for  each  /?  €  Js  there  exists  k  satisfying  |/3  —  a^\  < 
u/y/n. 

Now,  given  x  we  can  find  z±, . . . ,  zm,  z[, . . . ,  z'm  of  the  form  ||a;||*pAeA,  P\  €  {1,  — l,i,  — *}  such 
that  || B  —  B(x)\\p  <  u.  Further,  we  can  find  k  such  that  |||£c||2  —  a2.|  <  u/y/n.  We  replace  the 
zi, . . . ,  zm,  z[, , . . ,  z'm  by  the  respective  Zi, . . . ,  zm,  z[ , . . . ,  z'm  of  the  form  ajp xex. 

Then,  using  (36),  (38)  and  the  triangle  inequality,  we  obtain 

1  m 

WB^~~2  Y  B(Zj,Zj,)\\F  <2u. 

Now,  each  z),  z'  can  take  at  most  \2sy/n./u\  •  4  •  n2  values,  so  that 

1  m 

m-  £  1}lz.rz'y'i 

j,j’= i 

can  take  at  most  (4|"2aj']n2)2m  <  (Csni  /u)2m  values.  Hence,  we  found  a  2it-covering  of  the  set  of 
matrices  B(x)  with  x  £  Ts  of  cardinality  at  most  (Csni  /u)2m.  Unfortunately,  the  matrices  of  the 
covering  are  not  necessarily  of  the  form  B(x).  Nevertheless,  we  may  replace  each  relevant  matrix. 
(Clearly,  if  for  a  matrix  ^  B(zjizj’)  there  is  no  such  x,  then  we  can  discard  that  matrix.) 

^2  J2™j>= i  B{zhz'y)  by  a  matrix  B(x)  with 

1  m 

WB(X)~^2  Y  B(zJ,zf)\\F  <2u. 

j,j'= 1 

Again,  the  set  of  such  chosen  x  has  cardinality  at  most  ( Csni  /u)2m  and,  by  the  triangle  inequality,  for 
each  x  we  can  find  x  of  the  covering  such  that 

d2(®, x)  <  4 u. 

For  m  >  llu_2ns5,  we  consequently  get 

log(W(Tfl,d2,4u))  <  \og((Csni /u)2m)  =  2m\og(Cns5/2 /u). 
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The  choice  m  =  |Tlu  2ns^~\  <  27 u  2ns%  and  rescaling  gives 

log {N(Ts,d2,u))  <  27 u~2nsi  log(4 Cns5/2/u)  <  cu  2nsz  log (ns5/2/u). 

The  proof  of  Lemma  5  is  completed. 

3.4  Proof  of  Lemma  6,  Part  I 

Now  we  show  the  estimate 

\og(N(Ts,  di,u))  <  slog(en2/s)  +  slog(l  +  4su_1), 

which  will  establish  one  part  of  (20).  Before  doing  so,  we  note  that  one  can  quickly  obtain  an  estimate 
for  N(TS,  g?i,  u)  for  small  u  using  that  the  Frobenius  norm  dominates  the  operator  norm,  and,  hence 
d\{x,y)  <  d2{x,y)  <  2^/sn\\x  —  y ||2.  In  fact,  this  estimate  would  not  deteriorate  the  estimate  in 
Theorem  1(a).  But  in  the  proof  of  Theorem  1(b),  the  more  involved  estimate  d\{x,y)  <  2s\\x  —  y |j2 
developed  below  is  useful. 

Let  us  first  rewrite  d\.  Recall  (28)  in  Lemma  8,  namely,  Aqe\  =  7r(A)eg,  and,  with  A  =  (k,£)  and 
A'  =  (k',£'),  we  obtain 

7t(A,)*7t(A)  =  tok  <'t~e  V(A  —  A')  =  w( A,  A')7r(A  —  A'). 

Writing  now  x  =  Ylx^h  xZ  xxeXi  the  entries  of  the  matrix  B(x)  in  (27)  for  q'  q  are  given  by 

B(x)qiq  =  ^2x\xx >e*x,A*,Aqex  =  ^  x\Xye*,iv(\')*Tr(\)eq 

A, A'  A, A' 

=  ^2  x\xyu){\,  A')  e*,iv(A  -  X’)eq  =  ^  xxXyUj(A,  A')  e*,7r(A  -  X’)eq 

A, A'  A^A' 

=  e*,  (  Y]  x\Xyu{A,  A')  7r  ( A  A'))  eq. 

A^A' 

We  used  for  the  fourth  inequality  that  e*,n(£0,  k0)eq  =  0  if  q'  ^  q  and  fc0  =  0.  This  shows  that 

B(x)  —  ^2  x\x\'w{A,  A')  7t(A  —  A'). 

X^X' 

The  estimate  (26)  for  the  Schatten  norms  shows 

c?ip(a;,y)  =  ||  ^2  {x\X\'  -  y\yx,)u(\,  A')  tt(A  -  A')  11^2 

A^A' 

<  ||  ^2  {x\xy  -  y\y\')u(\  A')  tt(a  -  A')|||^p 

A^A' 

=  (xx.xx^  -  yx^x'2  ■  ■  ■  (xx2pxy2p  -  yx2pVx'2p)x 

Ait^Ai  ,  A2  ^  A2 ,  •  •  •  ?  A2p^A2p 

x  w(Ai,  Ai)  •  •  •  w(A2p,  A'2p)  Tr^7r(Ai  -  Ai)  •  •  •  7r(A2p  -  A'2p)). 

Setting  (f0,  kg)  =  Ai  —  A^  +  A2  —  X!2  +  ■  ■  •  +  A 2p  —  X'2p  we  observe  that  the  trace  in  the  last  expression 
sums  over  zero  entries  if  kg  ^  0  and  sums  over  roots  of  unity  to  zero  if  £o  ^  0.  We  conclude  that 

Tr^7r(Ai  -  Ai)  •  •  •  7r(A2p  -  X'2p)^j  <  nd0tx1-y1+\2-x'2+  -+\2P-X'2p  ■ 
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Hence, 


di(x,y)2p  <  n  Y  *a,*a'  -  !/x,!J\t.  Y 

Al7^Ax  ^2^A2 

•  E 

A2p-l7^A2p_1 


fa2®a'  -y\2y\'0  ••• 


^A2p-1^A'P_1  y^2P-iy \'2p_1 1  ^  y  l^p^Ai-A^H - hA2P  y^2py Xi  —  X'^ - bA4p 


Now  observe  that,  setting  t  =  \\  —  X[  +  •  •  •  +  A 2P-i  —  A^,^,  and  using  the  Cauchy-Schwarz  inequality 

E  l^t+A  -  yxVt+x I  <  X!  Mkt+A  -  Vt+ a|  +  E  IXa  _  2/a||2/a+*| 


A  A  A 

<  IMMI*  -  y||2  +  H*  -  y||2||y||2  =  (IMk  +  ||y||2)||*  -  y||2. 


We  obtain  similarly 


E  \XxXx'  —  2/a2/a' I  =  E  1^1  \xx'  -yx'\  +  \yx>\ |*a  yx I  <  (||*||i  +  ||y||i)ll*-y||i- 

A, A' 


A, A' 


For  a;,y  with  suppcc  =  suppy  =  A  for  |A|  <  s  and  || a;|| 2  =  ||y||2  =  1  we  have  ||jc||i  <  \/i||a;||2  =  \fs 
(and  similarly  for  y)  as  well  as  \\x  —  y ||  1  <  -y/sll*  ~  y H2-  Hence, 


|*||i  +  ||y||i)||*-y||i  <  2s||as  —  y||2- 


This  finally  yields 


d1(x,y)2p<22pns2p-1\\x-yf2p 
for  such  x,y.  As  this  holds  for  all  p  €  N  we  conclude  that 

di(x,y)  <2s\\x  -  y\\2.  (39) 

With  the  volumetric  argument,  see  for  example  [44,  Proposition  10.1],  we  obtain  the  bound 

log(AT(Ts,  ||  •  ||2,u))  <  s  log(en2/s)  +  slog(l  +  2/u). 

Rescaling  yields 

log(AT(Ts,  d\,u))  <  log(7V(Ts,2s||  •  ||2,u))  =  log(lV(Ts,  ||  •  ||2,  u/{2s))) 

<  slog(erc2/s)  +  slog(l  +  Isrfo1), 


which  is  the  claimed  inequality. 

3.5  Proof  of  Lemma  6,  Part  II 

Next  we  establish  the  remaining  estimate  of  (20), 

log(N(Ts,  (fa,  u))  <  cu~2s2  log(2n)  log(n2/tt). 

To  this  end,  we  use  again  the  empirical  method  as  in  Section  3.3. 

For  x  £  TSl  we  define  Z\, ... .  Zrn  and  Z[, . . . ,  Z'm  as  in  Section  3.3,  that  is,  each  takes  independently 
the  value  ||a:||*  sgn(Re:rA)eA  with  probability  - ,  and  the  value  i||*||*  sgn(Imx^)eA  with  probability 
]  IhixaI 

INI.  • 

As  before,  we  set 

B(Z,Z')  =  (Z*WqlqZ')q,,q,  (40) 
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where  A*q,Aq  =  A*,Aq  for  q'  y  q  and  Wqq  =  0,  j  =  1, ... . ,  N,  and  attempt  to  approximate  B(x)  with 

1  m 

(41) 

3= 1 

That  is,  we  will  estimate  E|j.B  —  B(x) || 2—^2 - 

We  will  use  symmetrization  as  formulated  in  the  following  lemma  [44,  Lemma  6.7],  see  also  [33, 
Lemma  6.3],  [39,  Lemma  1.2.6].  Note  that  we  will  use  this  result  with  Bj  =  B(Zj ,  Z)). 

Lemma  9  (Symmetrization)  Assume  that  ( is  a  sequence  of  independent  random  vectors  in  Cr 
equipped  with  a  (semi-)norm  ||  •  ||,  having  expectations  /3j  =  E Y).  Then  for  1  <  p  <  00 

(E||^(y,-/3i)r)  P<2(E||^e^r)  P,  (42) 

3- i  3= 1 

where  ( ej )jLi  is  a  Rademacher  series  independent  of  (Yj)y=1. 

To  estimate  the  2p-th  moment  of  \\B(x)  —  B\\2->2,  we  will  use  the  noncommutative  Khintchine  inequality 
[7,  44]  which  makes  use  of  the  Schatten  p- norms  introduced  in  (24). 


theorem  10  (Noncommutative  Khintchine  inequality)  Let  e  =  (ei, . . . ,  em)  he  a  Rademacher  sequence, 
and  let  Aj,  j  =  1, ... ,  m,  he  complex  matrices  of  the  same  dimension.  Choose  p  €  N.  Then 


E| 


E' 

3= 1 


:  ■  A  - 


I  2  p 


< 


S2p  - 


(2 p)\ 

2  Pp\ 


max 


m 

{IKE 


A  ■  4* 


1/2 


3= 1 


2  P 

Sip 


Ea 

i= 1 


A  ■ 

3 


1/2 


2  V 

Sip 


I 


(43) 


Let  p  £  N.  We  apply  symmetrization  with  Bj  =  B ( Zq ,  Z) ) ,  estimate  the  operator  norm  by  the 
Schatten- 2p- norm  and  apply  the  noncommutative  Khintchine  inequality  (after  using  Fubini’s  theorem), 
to  obtain 


j  .  m  j_ 

(E\\B  -  B(x)\\22^2)  2p  =  (E\\mY,(B(Zj,Z'j)-EB(Zj,Z'))\\t^2yp 

3  = 1 

m  _L  cy  171  _l_ 

(nJ2^B(Zj,Z'j)\\‘^2yp  <  TO(E||^e,JB(ZJ,Z')|||p)2p 


< 


< 


3= 1 


t=i 


—  (  2 )  P  max  {|| 

i=i 


1/2 


2p 

S2p’ 


m 

(yiLzJ.z;iRLzJ.z'lr 


3= 1 


1/2 


2P 


J 


Vi* 


Now  recall  that  the  may  take  the  values  ||cc||*PAeA, 

Pa  G  {1,-1, *,—*}•  Further,  observe  that  B(e X',ex)*  =  B(e \,ey),  and,  for  q  y  q' , 


(44) 

with 


(B(ex,,exyB(ex,,ex))qy,  =  Ee*A*A«'e*'  e*x, A*q, Aq„ex 

q' 

=  YjelA*qAq,Px,A*q,Aq,lex  =  e*xA*q(J2ATP^A*q')  Ao"^ 

q'  q' 

=  e\A*Aq„ex  =  (7r(A)e,//,7r(A)eg)  =  (eq»,eq)  =  8(q"  -  q). 

Therefore,  B(ey,  ex)*B(ex>,  ex)  =  I  and 

B(Ze,Z'eyB(Zj,Z'j)  =  \\x\\il.  (45) 
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Since  ||  J|||p  =  n,  ||a;||*  <  2s|| a;|| 2  =  2s,  we  obtain 

7TL  -t  /q  TTL  i  /q 

||  (^2B(ZjtZ^B(ZjtZ^)  llsi  =  ll(EHa:ll*J')  \\2s2p  =  MiPmpn 

3  =  1  3= 1 

<  (2 s)2pmpn.  (46) 

By  symmetry  this  inequality  applies  also  to  the  second  term  in  the  maximum  in  (44).  This  yields 

(Eli fl-B(.)fc)4  <  4(M!)* 


V  "  v  -  m\2 Qq\  J  -  spin  V2 Pp!  / 

Using  Holder’s  inequality,  we  can  interpolate  between  2p  and  2p  +  2,  and  an  application  of  Stirling’s 
formula  yields  for  arbitrary  moments  p  >  2,  see  also  [44], 


(e||B  -  B(*)||S_2) 1/P  <  2 VMnVPe-^s/p^- 


Now  we  use  the  following  lemma  relating  moments  and  tails  [43,  44]. 

Proposition  11  Suppose  H  is  a  random  variable  satisfying 

(E|H|p)1/p  <  a/31/pplh  for  all  p  >  po 
for  some  constants  a,0,j,po  >  0.  Then 

P(|5|  >  elhav)  < 

for  all  v  >  p]/1 . 

Applying  the  lemma  with  p0  =  2,  7  =  2,  /3  =  2 3/4n,  a  =  e_1/2^=,  and 

e-1/7  e~1/2s/m  \pin  r- 


In  particular,  if 


e  '  '  e  "'“Vm  VTO  ^  to 

w  =  u - =  u - — - —  =  u— —  >  v  2 

a  e_1/24s  4s 


—  B(x)\\2^2  >  <  23^4ne  IS5 )  u  >  4s\/2/m. 


QO„2 

to  >  — log(23/4n)  (48) 

tr 

then  there  exists  a  matrix  of  the  form  A  B(zjizj)  with  Zj,z'  of  the  given  form  ||a;||*pAeA  for 

some  jfc  such  that 

^  m 

\-^2B(Z3^)~B^)\\  <  U- 

3= 1 

As  before,  we  still  have  to  discretize  the  prefactor  ||aj||*.  Assume  that  a  is  chosen  such  that  |||a:||2  — a2|  < 
u.  Then,  similarly  as  in  (38), 

^  m 

—  Y  B(a  sgn(xXj  )eXj ,  a  sgn(xAi,  )eXj, ) 

771  . 

3  =  1 

^  m 

- V  -B ( || a; || 1  sgn(a; Aj.  )eAj ,  || a: 1 1 1  sgn(a:A  , ) eA  ) 

m  '  03  2— >2 

3= 1 

^  m 

=  I II *11 1  -  «2 1 II  —  E  B(sSn(x^j  )e*j >  sgn(xAj,  )eA )  || 2^2 


<  —  E  HjB(sgn^u)eAJ,sgn(a;Aj,)eAj,)||2^2  =  «• 
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Hereby,  we  used  ||S(sgn(a;Aj.)eAi,sgn(a;Aj,)eAj.,)||2^2  =  1- 

As  in  Section  3.3,  we  use  a  discretization  of  Js  =  [1,  2s]  with  about  K  =  elements,  ai, . . . ,  olk 
such  that  for  any  (3  in  Js  there  exists  k  such  \/3  —  a\\  <  u.  Now,  provided  (48)  holds,  for  given  x  we 
can  find  z  i, . . . ,  zm,  of  the  form  a*,  sgn(a;A)eA,  p(  A)  €  {1,  —1,  i,  —i},  with 

^  m 

\\B(X)  -  —  <  2 u. 

j= 1 

Observe  as  in  Section  3.3  that  each  Zj  can  take  4[^]n2  values,  so  that  A  B(zj,z'j )  can  take  at 
most  (4|"^]?r2)2m  <  ( Cn2s/u)2m  values.  As  seen  before,  this  establishes  a  4 u  covering  of  the  set  of 
matrices  B(x)  with  x  £  Ts  of  cardinality  at  most  ( Cn2s/u)2m ,  and  we  conclude 

s2 

log(iV(Ts,  di,u))  <  log ((Cn2s/u)2m)  <  C ^  log(23/4n)  log(C?r2s/u) 

ir 

~  s2 

<  C -g  log(2n)  log(n2/u). 
uz 

This  completes  the  proof  of  Lemma  6. 


4  Probability  estimate 

To  prove  Theorem  1(b)  will  use  the  following  concentration  inequality,  which  is  a  slight  variant  of 
Theorem  17  in  [6],  which  in  turn  is  an  improved  version  of  a  striking  result  due  to  Talagrand  [52].  Note 
that  with  B(x)  as  defined  above,  Y  below  satisfies  EY  =  nE<5s. 


theorem  12  Let  38  =  {B(x)}x<zt  be  a  countable  collection  of  n  x  n  complex  Hermitian  matrices,  and 
let  e  =  (ei, . . .  ,en)T  be  a  sequence  of  i.i.d.  Rademacher  or  Steinhaus  random  variables.  Assume  that 
B(x)qtq  =  0  for  all  iff.  Let  Y  be  the  random  variable 


Y  =  sup 

xeT 


*B{x)e 


n 

9,9'  =  1 


Define  U  and  V  to  be 


and 


U  =  sup  ||B(£C)||2-V2 

xGT 


V  =  Esup||B(cc)e||2 

xGT 


(49) 


Then,  for  A  >  0, 

p(y  >  E[y]  +  a)  <  «p(-32y+A65t/A/3)-  (50) 

Proof:  For  Rademacher  variables,  the  statement  is  exactly  Theorem  17  in  [6].  For  Steinhaus  sequences, 
we  provide  a  variation  of  its  proof.  For  e  =  (ei, . . . ,  en ) ,  let  ,9ju(e)  =  k=i  LiekAIj^k  and  set 


y  =  m 


sup 

MeSS 


9m(c) 


Further,  for  an  independent  copy  eg  of  ee,  set  =  (ei, . . . ,  eg,  ee,  e^+i,  • .  • ,  en)  and  Y^  =  /(e^). 
Conditional  on  (ei, . . . ,  en),  let  M  =  M(e)  be  the  matrix  giving  the  maximum  in  the  definition  of  Y. 
(If  the  supremum  is  not  attained,  then  one  has  to  consider  finite  subsets  T  C  38.  The  derived  estimate 
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will  not  depend  on  T,  so  that  one  can  afterwards  pass  over  to  the  possibly  infinite,  but  countable,  set 
38.)  Then  we  obtain,  using  M*  =  M  and  Mkk  =  0  in  the  last  step, 


E 


(1  -  Y (^)  lz>z(o|e  <E  |g^(e)  -  5m(c(^)I  lz>z«ole 


=  E 


|(q  -  ee)  ejMj,e  +  (e^  —  e^)  £fcMf,A:|2lz>z(<)  I 

j=l,j^£  k=l,k^£ 


y!  ejMj/  —  8 1  ejMj/ 

3  = 1 


<  4ErJe^  -  q| 

The  remainder  of  the  proof  is  analogous  to  the  one  in  [6]  and  therefore  omitted. 


We  first  note  that  we  may  pass  from  Ts  to  a  dense  countable  subset  T°  without  changing  the 
supremum,  hence  Theorem  12  is  applicable.  Now,  it  remains  to  estimate  U  and  V.  To  this  end,  note 
that  (39)  implies 

U  =  sup  ||B(a;)||2->.2  <  sup  2s||£c||2  =  2s. 

x£Ts  xGTs 

The  remainder  of  this  section  develops  an  estimate  of  the  quantity  V  in  (49).  Hereby,  we  rely  on  a 
Dudley  type  inequality  for  Rademacher  or  Steinhaus  processes  with  values  in  f2,  see  below.  First  we 
note  the  following  Hoeffding  type  inequality. 


Proposition  13  Let  e  =  (eg)”=1  be  a  Steinhaus  sequence  and  let  B  G  Cmxn .  Then,  for  u  >  0, 

p(||fle||2>u||B||F)  <8e-“2/16.  (51) 

Proof:  In  [46,  Proposition  B.l],  it  is  shown  that 

p(||Be||2>u||B||F)  <2e-w2/2.  (52) 

for  Rademacher  sequences.  We  extend  this  result  using  the  contraction  principle  [33,  Theorem  4.4],  as 
in  the  proof  of  Theorem  3. 

In  fact,  [33,  Theorem  4.4]  implies  that  for  B  <G  Cnxn  and  e  being  a  Steinhaus  sequence  and  £  a 
Rademacher  sequence,  we  have,  for  example 

P(||  Rc(B)  Re(e)||2  >  u||B||F)  <  2P(||  Re££||2  >  u||B||F)  <  4e"“2/2. 


Hence, 


P(||He||2  >  «||B||f)  =  P(||  Re(i?e)||2  +  ||  Im(Be)|||  >  u2\\B\\%) 

<  P(ll  Re(Be)||2  >  ^=)  +  P(||  Im(He)||I  >  ^\\B\\%) 
<P(||ReJBRee)||2  >^g||B||!0  +P(||ImBIme)||2>-|||B||2,) 
+  P(||ReBIme)||2  >  -|||B||2F) +P(||  ImBRee)||2  >  -|||B||2F) 

<  8e-“2/16. 


With  more  effort,  one  may  also  derive  (51)  with  better  constants.  Let  us  now  estimate  the  quantity 


V  =  E  sup  ||.B(a:)e||2 

xeTs 


E  SUP  H  \  ^Z^qBiX)q'  ,q\2  ■ 

xGTs  q' 9=1 
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It  follows  immediately  from  Proposition  13  and  (52)  that  the  increments  of  the  process  satisfy 


P(||B(*)e  -  B(x')e ||2  >  u\\B(x)  -  B(x')\\F)  <  8e"“2/16.  (53) 

This  allows  to  apply  the  following  variant  of  Dudley’s  inequality  for  vector- valued  processes  in  £ 2 . 

theorem  14  Let  Rx,  x  £  T,  be  a  process  with  values  in  Cm  indexed  by  a  metric  space  ( T,d ),  with 
increments  that  satisfy  the  subgaussian  tail  estimate 

V{\\Rx-Rx'\\2>ud(x,x'))<8e-u2/16. 

Then,  for  an  arbitrary  x0  £  T  and  a  universal  constant  K  >  0, 

(e  sup  ll-R,.  —  RXo H2)  <K  [  \/log(7V(T,  d,u))du,  (54) 

v  xgt  J  Jo 

where  N{T,d,  u)  denote  the  covering  numbers  of  T  with  respect  to  d  and  radius  u  >  0. 

Proof:  The  proof  follows  literally  the  lines  of  the  standard  proof  of  Dudley’s  inequalities  for  scalar¬ 
valued  subgaussian  processes,  see  for  instance  [44,  Theorem  6.23]  or  [2,  33,  53].  One  only  has  to  replace 
the  triangle  inequality  for  the  absolute  value  by  the  one  for  ||  •  ||2  in  Cm.  ■ 

We  have  d  =  d2  defined  above,  and,  hence,  (21)  provides  us  with  the  right  hand  side  of  (54).  Using 
the  fact  that  here,  Rx  =  B{x)e ,  we  conclude  that 

V  =  E  sup  ||£?(a;)e||2  =  E  sup  ||IJ(a:)e  —  B{ 0)e||| 

xGTs  xG  Ts 

<  (KCV ?rs3/2-\/log(n)  log(s))2  <  C'ns 3^2  log(n)  log2(s). 

Plugging  these  estimates  into  (50)  and  simplifying  leads  to  our  result,  compare  with  [46].  In  partic¬ 
ular,  Theorem  1(b)  follows. 
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